Medical Expenditure Panel Survey: Data Visualization on HIV Population in the United States

Jennifer Pham

2023-05-08


This tutorial is a combined product of two deliverables I completed at the end of my practicum course in the Master in Public Health (MPH) program at the University of California San Diego (UCSD).

The practicum was conducted at UCSD Skaggs School of Pharmacy & Pharmaceutical Sciences (SSPPS) and VA Health Economics Resource Center (HERC). To design an interactive data visualization of a population health tool on the prevalence of human immunodeficiency virus (HIV) in the United States, we used Medical Expenditure Panel Survey (MEPS) data, which is publicly available and should be accessible to anyone who is interested in public health. This open-source data visualization tool will allow others to reproduce, verify or validate other studies or reports.

Introduction

Researchers have used the Agency for Healthcare Research and Quality (AHRQ) Medical Expenditure Panel Survey (MEPS) data for critical public health investigations that have led to more robust hypothesis generating studies. MEPS is a dataset that uses a complex survey design to capture healthcare expenditures among a representative population in the United States. It is administered annually and contains information on the respondents’ demographics, income, health, and healthcare expenditures. However, there is a lack of engagement with MEPS data by pharmacy students and researchers at SSPPS. This project will introduce researchers at SSPPS to use MEPS data to perform original investigations that will lead to potential funding.

Access MEPS Data

There are various types of dataset you can access in the MEPS data files page. For my practicum, I focused on the Full-Year Consolidated Data files and the Medical Condition files.

Figure 1 - MEPS data files used for this practicum

Load MEPS data into R

1. Install R packages

There are multiple packages you need to install for this project.

Notes: You only need to install these once

# install.packages("foreign")
# install.packages("devtools")
# install.packages("tidyverse")
# install.packages("gtsummary")
# install.packages("readr")
# install.packages("readxl")
# install.packages("haven")
# install.packages("survey")
# install.packages("RSQLite")
# install.packages("dplyr")
# install.packages("plotly")
# devtools::install_github("e-mitchell/meps_r_pkg/MEPS") #This will install the MEPS package for R

2. Load R libraries

After installing the packages, you need to load the following libraries every time you restart R.

library("foreign")
library("devtools")
library("tidyverse")
library("gtsummary")
library("readr")
library("readxl")
library("haven")
library("survey")
library("RSQLite")
library("dplyr")
library("plotly")
library("MEPS")

3. Load MEPS data files into R

The MEPS package allows us to load MEPS data files directly from AHRQ MEPS website. In this example, I use the read_MEPS function to load the MEPS 2020 Full-Year Consolidated file named h224 and the 2020 Medical Condition file named h222 directly onto R.

hc2020 = read_MEPS(file = "h224")
medcon2020 = read_MEPS(file = "h222")

Analyze MEPS data

1. Reduce dataframe: keep your variables of interest

There are thousands of variables in the MEPS 2020 Full-Year Consolidated file and 2020 Medical Condition file. Using the subset function, I generate smaller data frames containing only a few variables I am interested in for this practicum.

  • From the MEPS 2020 Full-Year Consolidated file h224, I create a smaller data frame called hc2020 including the variables DUPERSID, VARPSU, VARSTR, PERWT20F, AGE20X, SEX, HISPANX, POVCAT20, EDUCYR, WAGEP20X, TTLP20X, TOTEXP20, TOTSLF20, RXEXP20, RXSLF20 & INSCOV20.

  • From the 2020 Medical Condition file h222, I create a smaller data frame called medcon2020 including the variables DUPERSID & ICD10CDX.
hc2020 <- subset (hc2020, select = c("DUPERSID", "VARPSU", "VARSTR", "PERWT20F", "AGE20X", "SEX", "HISPANX", "POVCAT20", "EDUCYR", "WAGEP20X", "TTLP20X", "TOTEXP20", "TOTSLF20", "RXEXP20", "RXSLF20", "INSCOV20"))
medcon2020 <- subset(medcon2020, select = c( "DUPERSID", "ICD10CDX"))

2. Add labels for categorical variables

Values of categorical variables are numerically coded in MEPS data, which does not give clear information when we visually inspect these variables.

Figure 2 - Example of a categorical variable in the MEPS 2020 Full-Year Consolidated file h224

table(hc2020$EDUCYR)
## 
##  -15   -8   -7   -1    0    1    2    3    4    5    6    7    8    9   10   11 
##    1  150   22 1948  854  425  429  431  435  436  661  453  671  813  871 1061 
##   12   13   14   15   16   17 
## 6421 1367 2811  700 4061 2784

To clarify what each value represents, we can add labels to these categorical variables using the functions factor, as.factor & ifelse

  • Add labels for EDUCYR variable in the MEPS 2020 Full-Year Consolidated file:
hc2020$EDUCYR_CAT20 = as.factor(ifelse (0 == hc2020$EDUCYR, "No school/kindergarten only", 
                                ifelse (1 <= hc2020$EDUCYR & hc2020$EDUCYR <= 8, "Elementary grades 1-8", 
                                ifelse (9 <= hc2020$EDUCYR & hc2020$EDUCYR <= 12, "High school grades 9-12",
                                ifelse (13 <= hc2020$EDUCYR & hc2020$EDUCYR <= 15, "1-3 years college", 
                                ifelse (16 <= hc2020$EDUCYR, "4+ years of college",  
                                ifelse (-15 == hc2020$EDUCYR, "Cannot be computed",
                                ifelse (-8 ==hc2020$EDUCYR , "DK",
                                ifelse (-7 ==hc2020$EDUCYR, "Refused", 
                                ifelse (-1 ==hc2020$EDUCYR, "Inapplicable", "NA"))))))))))

table(hc2020$EDUCYR_CAT20)
## 
##           1-3 years college         4+ years of college 
##                        4878                        6845 
##          Cannot be computed                          DK 
##                           1                         150 
##       Elementary grades 1-8     High school grades 9-12 
##                        3941                        9166 
##                Inapplicable No school/kindergarten only 
##                        1948                         854 
##                     Refused 
##                          22
  • Add labels for SEX variable in the MEPS 2020 Full-Year Consolidated file:
hc2020$SEX <- factor(hc2020$SEX, levels = c(1, 2), labels = c("Male", "Female"))

table(hc2020$SEX)
## 
##   Male Female 
##  13258  14547
  • Add labels for HISPANX variable in the MEPS 2020 Full-Year Consolidated file:
hc2020$HISPANX <- factor(hc2020$HISPANX, levels = c(1,2), labels = c("Hispanic", "Not Hispanic"))

table(hc2020$HISPANX)
## 
##     Hispanic Not Hispanic 
##         6814        20991
  • Add labels for POVCAT20 variable in the MEPS 2020 Full-Year Consolidated file:
hc2020$POVCAT20 <- factor(hc2020$POVCAT20, levels = c(1, 2, 3, 4, 5), labels = c("Poor", "Near poor", "Low income", "Middle income", "High income"))

table(hc2020$POVCAT20)
## 
##          Poor     Near poor    Low income Middle income   High income 
##          4683          1372          4096          7617         10037
  • Add labels for INSCOV20 variable in the MEPS 2020 Full-Year Consolidated file:
hc2020$INSCOV20 <- factor(hc2020$INSCOV20, levels = c(1, 2, 3), labels = c("Any private", "Public only", "Uninsured"))

table(hc2020$INSCOV20)
## 
## Any private Public only   Uninsured 
##       16056        9539        2210

3. Merge MEPS data files

We use the left_join function from the dply package to merge the MEPS 2020 Full-Year Consolidated file and 2020 Medical Condition file via their matching variable DUPERSID. We name the merged data set hc_medcon2020.
Notice that in the MEPS 2020 Full-Year Consolidated file, the DUPERSIDis unique to each person in each row and thus, is not repeatable. However, in the 2020 Medical Condition file, if a person has multiple ICD10 diagnostic codes, these diagnoses are grouped by their DUPERSID. As a result, the DUPERSID can be repeated for a person. This merging method is calld 1 to many match.

hc_medcon2020 <- left_join(hc2020, medcon2020, by = c("DUPERSID")) 

nrow(hc_medcon2020)
## [1] 90038

For more detailed explanations one merging MEPS data, visit this additional tutorial

4. Identify HIV population

    4.1 Create an indicator for HIV ICD10 diagnostic code

The HIV ICD10 diagnostic code in the 2020 Medical Condition file is Z21. Using that information, we generate a binary predictor HIV to differentiate patients with or without HIV diagnosis among our (potentially) repeated rows.

hc_medcon2020$HIV[hc_medcon2020$ICD10CDX == "Z21"] = 1 
hc_medcon2020$HIV[hc_medcon2020$ICD10CDX != "Z21" | is.na(hc_medcon2020$ICD10CDX)] = 0

table(hc_medcon2020$HIV, useNA = "always") ## visualize the number of patients with and without HIV. 
## 
##     0     1  <NA> 
## 89986    52     0
##The 'useNA' function is used to capture all patients who did not have a diagnosis code (i.e. ICD10CDX = 'NA')
    4.2 Calculate the number of times “Z21” appears for a unique patient
hc_medcon2020 <- hc_medcon2020 %>% 
  group_by(DUPERSID) %>%
  mutate(HIV_indicator = sum(HIV == "1", na.rm = TRUE)) %>% 
  ungroup 

table(hc_medcon2020$HIV_indicator)
## 
##     0     1 
## 89695   343
    4.3 Transform the binary variable HIV in the hc_medcon2020 dataframe into a new variable that only has 0 and 1
hc_medcon2020 <- hc_medcon2020 %>%
  group_by(DUPERSID) %>% 
  mutate(HIV_binary = ifelse(HIV_indicator >= 1, 1, 0), na.rm = TRUE) %>% 
  ungroup 

table(hc_medcon2020$HIV_binary)
## 
##     0     1 
## 89695   343
    4.4 Collapse dataframe to a single unique patient
MEPS2020 <- hc_medcon2020 %>% 
  group_by (DUPERSID, SEX, HISPANX, POVCAT20, EDUCYR_CAT20, INSCOV20) %>% summarize_all(list(mean))

table(MEPS2020$HIV_binary) #get sample size_HIV
## 
##     0     1 
## 27753    52

5. Applying weights

  • Using this tutorial, we can apply weights to obtain the weighted estimate of HIV patients in the United States in 2020.
options(survey.lonely.psu = 'adjust')

mepsdsgn = svydesign(id = ~VARPSU, 
                     strata = ~VARSTR, 
                     weights = ~PERWT20F, 
                     data = MEPS2020, 
                     nest = TRUE)

svytable(~MEPS2020$HIV_binary, design = mepsdsgn) 
## MEPS2020$HIV_binary
##           0           1 
## 328049372.3    495924.5
  • We use the tbl_svysummary function to generate the contingency table
mepsdsgn %>% 
  tbl_svysummary(by = HIV_binary, 
                 percent = "column", 
                 include = c(AGE20X, SEX, HISPANX, POVCAT20, EDUCYR_CAT20, WAGEP20X, TTLP20X, TOTEXP20, TOTSLF20, RXEXP20, RXSLF20, INSCOV20), 
                 statistic = list(all_continuous() ~ "{mean} ({sd})", all_categorical() ~ "{n} ({p}%)"), 
                 digits = list(all_continuous() ~ c(0, 0), 
                               all_categorical() ~ c(0, 1))) %>%
                                  add_p() %>% 
                                    modify_header(label = "**Variable**", 
                                                  all_stat_cols() ~ "**{level}**<br>N = {n} ({style_percent(p, digits=1)}%)") %>% 
                                                    modify_caption("2020 Weighted descriptive statistics between HIV and non-HIV patients") %>% 
                                                      bold_labels()
2020 Weighted descriptive statistics between HIV and non-HIV patients
Variable 0
N = 328049372 (99.8%)1
1
N = 495925 (0.15%)1
p-value2
AGE20X 39 (23) 51 (15) <0.001
SEX 0.004
    Male 160,568,843 (48.9%) 392,146 (79.1%)
    Female 167,480,529 (51.1%) 103,778 (20.9%)
HISPANX 0.4
    Hispanic 61,540,707 (18.8%) 71,780 (14.5%)
    Not Hispanic 266,508,665 (81.2%) 424,144 (85.5%)
POVCAT20 0.007
    Poor 37,528,003 (11.4%) 119,819 (24.2%)
    Near poor 12,575,554 (3.8%) 52,774 (10.6%)
    Low income 40,588,183 (12.4%) 75,320 (15.2%)
    Middle income 92,755,652 (28.3%) 78,246 (15.8%)
    High income 144,601,980 (44.1%) 169,767 (34.2%)
EDUCYR_CAT20 0.5
    1-3 years college 63,806,617 (19.5%) 112,365 (22.7%)
    4+ years of college 88,958,603 (27.1%) 175,460 (35.4%)
    Cannot be computed 11,376 (0.0%) 0 (0.0%)
    DK 1,318,985 (0.4%) 0 (0.0%)
    Elementary grades 1-8 41,630,025 (12.7%) 13,302 (2.7%)
    High school grades 9-12 96,640,895 (29.5%) 194,797 (39.3%)
    Inapplicable 25,823,204 (7.9%) 0 (0.0%)
    No school/kindergarten only 9,734,457 (3.0%) 0 (0.0%)
    Refused 125,211 (0.0%) 0 (0.0%)
WAGEP20X 27,609 (42,931) 34,627 (57,260) 0.4
TTLP20X 36,111 (45,955) 49,069 (72,824) 0.083
TOTEXP20 6,216 (21,487) 39,334 (30,719) <0.001
TOTSLF20 809 (3,924) 1,769 (3,594) <0.001
RXEXP20 1,400 (11,614) 27,961 (22,421) <0.001
RXSLF20 138 (942) 1,154 (3,287) <0.001
INSCOV20 0.054
    Any private 217,082,727 (66.2%) 273,282 (55.1%)
    Public only 89,575,740 (27.3%) 222,642 (44.9%)
    Uninsured 21,390,906 (6.5%) 0 (0.0%)
1 Mean (SD); n (%)
2 Wilcoxon rank-sum test for complex survey samples; chi-squared test with Rao & Scott’s second-order correction

Repeat the same process for the files from 2016 to 2019

Year Full-Year Consolidated file Medical Condition file
2016 hc192 hc190
2017 hc201 hc199
2018 hc209 hc207
2019 hc216 hc214

Plot HIV population data - in progress

Using the plot_ly package, we create some data visualizations using the results from the contingency table.

1. Stacked bar chart: visualizing poverty categories in HIV population

year <- c('2016', '2017', '2018', '2019', '2020')
poor <- c(186160, 162488, 123302, 154536, 119819)
near_poor <- c(30870, 58830, 81270, 35477, 52774)
low_income <- c(56948, 69803, 63544, 81215, 75320)
middle_income <- c(43584, 55957, 49623, 45672, 78246)
high_income <- c(29408, 143485, 98212, 75397, 169767)

data1 <- data.frame(year, poor, near_poor, low_income, middle_income, high_income)

fig1 <- data1 %>% plot_ly()
fig1 <- fig1 %>% add_trace(x = ~ year, 
                           y = ~ poor, 
                           type = 'bar', 
                           width = 0.7, 
                           name = " Poor",
                           text = poor,
                           textfont = list(size = 14),
                           textposition = 'auto',
                           hovertemplate = paste('%{y:,.3~s} HIV patients with <br> <b><em>poor income</b></em> status in %{x}'),
                           hoverlabel = list(bordercolor = 'transparent', 
                                             font = list(size = 18, color = 'white')),
                           marker = list(color = 'steelblue3', 
                                         line = list(color = 'gray', width = 1)))

fig1 <- fig1 %>% add_trace(x = ~ year, 
                           y = ~ near_poor, 
                           type = 'bar', 
                           width = 0.7, 
                           name = "Near poor",
                           text = near_poor,
                           textfont = list(size = 14),
                           textposition = 'auto',
                           hovertemplate = paste('%{y:,.3~s} HIV patients with <br> <b><em>near poor income</b></em> status in %{x}'),
                           hoverlabel = list(bordercolor = 'transparent', 
                                             font = list(size = 18, color = '#0000EE')),
                           marker = list(color = 'lightblue', 
                                         line = list(color = 'gray', width = 1)))

fig1 <- fig1 %>% add_trace(x = ~year, 
                           y = ~ low_income, 
                           type = 'bar', 
                           width = 0.7, 
                           name = "Low income",
                           text = low_income, 
                           textfont = list(size = 14),
                           textposition = 'auto',
                           hovertemplate = paste('%{y:,.3~s} HIV patients with <br> <b><em>low income</b></em> status in %{x}'),
                           hoverlabel = list(bordercolor = 'transparent', 
                                             font = list(size = 18, color = 'black')),
                           marker = list(color = 'gainsboro', 
                                         line = list(color = 'gray', width = 1)))

fig1 <- fig1 %>% add_trace(x = ~ year, 
                           y = ~ middle_income, 
                           type = 'bar', 
                           width = 0.7, 
                           name = "Middle income",
                           text = middle_income, 
                           textfont = list(size = 14),
                           textposition = 'auto',
                           hovertemplate = paste('%{y:,.3~s} HIV patients with <br> <b><em>middle income</b></em> status in %{x}'),
                           hoverlabel = list(bordercolor = 'transparent', 
                                             font = list(size = 18, color = 'firebrick')),
                           marker = list(color = 'pink', 
                                         line = list(color = 'gray', width = 1)))

fig1 <- fig1 %>% add_trace(x = ~ year, 
                           y = ~ high_income, 
                           type = 'bar', 
                           width = 0.7, 
                           name = "High income",
                           text = high_income, 
                           textfont = list(size = 14),
                           textposition = 'auto',
                           hovertemplate = paste('%{y:,.3~s} HIV patients with <br> <b><em>high income</b></em> status in %{x}'),
                           hoverlabel = list(bordercolor = 'transparent', 
                                             font = list(size = 18, color = 'white')),
                           marker = list(color = 'firebrick', 
                                         line = list(color = 'gray', width = 1)))

fig1 <- fig1 %>% layout(title = "Number of HIV patients by poverty category (2016 - 2020)",
                        font = list(size = 16),
                        barmode = 'stack',
                        plot_bgcolor= 'white',
                        xaxis = list(title = "Years", 
                                     size = 12, 
                                     tickfont = list(size = 18),
                                     categoryorder = "array",
                                     categoryarray = c('2016', '2017', '2018', '2019', '2020')),  ## Order & label the x-axis
                        yaxis = list(title = "Number of HIV patients", 
                                     size = 12, 
                                     tickfont = list(size = 12)),
                        margin = list(l = 20, r = 10, b = 10, t = 40))
fig1

2. Vertical grouped bar chart: races among HIV population

year <- c('2016', '2017', '2018', '2019', '2020')  
hispanic <- c(59214, 124513, 84416, 66646, 71780)                      
not_hispanic <- c(287756, 366051, 331534, 325651, 424144)                        
data2 <- data.frame(year, hispanic, not_hispanic)

fig2 <- data2 %>% plot_ly()
fig2 <- fig2 %>% add_trace(x = ~ year, 
                           y = ~ hispanic, 
                           type = 'bar', 
                           width = 0.5,
                           name = "Hispanic",
                           text = hispanic,
                           textposition = 'auto',
                           hovertemplate = paste('%{y:,.3~s} <b><em>Hispanic</b></em> HIV patients in %{x}'),
                           hoverlabel = list(bordercolor = 'black', 
                                             font = list(size = 18, color = 'black')),
                          marker = list(color = "#5F9EA0",
                                        line = list(color = 'gray', width = 0.5)))

fig2 <- fig2 %>% add_trace(x = ~ year,
                           y = ~ not_hispanic, 
                           type = 'bar', 
                           width =0.5,
                           name = "Not Hispanic",
                           text = not_hispanic, 
                           textposition = 'auto',
                           hovertemplate = paste('%{y:,.3~s} <b><em>Non-Hispanic</b></em> HIV patients in %{x}'),
                           hoverlabel = list(bordercolor = 'black', 
                                             font = list(size = 18, color = 'black')),
                           marker = list(color = "#D1EEEE", 
                                         line = list(color = 'gray', width = 0.5)))

fig2 <- fig2 %>% layout(title = "Number of HIV patients by races (2016-2020)", 
                        font = list(size = 16),
                        barmode = 'group',
                        xaxis = list(title = "Years", 
                                     size = 12, 
                                     tickfont = list(size = 18)),
                        yaxis = list(title = "Number of HIV patients", 
                                     size = 12, 
                                     tickfont = list(size=12)),
                        margin = list(l = 50, r = 30, b = 10, t = 40))

fig2

3. Bubble chart

    3.1 Total health care and prescription expenditures, total person income and person wage among HIV population
year <- c('2016', '2017', '2018', '2019', '2020')  
hiv_totexp <- c(32462, 38988, 57266, 44048, 39334)
hiv_rxexp <- c(26110, 25979, 45816, 36463, 27961)
hiv_wage <- c(11484, 21369, 19527, 27573, 34627)
hiv_income <- c(19877, 33938, 33123, 40410, 49069)

data3.1 <- data.frame(year, hiv_totexp, hiv_rxexp, hiv_wage, hiv_income)

fig3.1  <- plot_ly(data3.1, 
                   x = ~ year, 
                   y = ~ hiv_totexp, 
                   name = 'HIV total<br>health care<br>expenditure', 
                   text = hiv_totexp, 
                   textposition = 'auto',
                   size = ~ hiv_totexp,
                   type = 'scatter', 
                   mode = 'markers',
                   marker = list (sizemode = "diameter"),
                   hovertemplate = paste('%{x}, $%{text}'),
                   hoverlabel = list(bordercolor = 'transparent', 
                                     font = list(size = 16, color = 'white')))
                   
fig3.1 <- fig3.1 %>% add_trace(x = ~ year, 
                               y = ~ hiv_rxexp, 
                               name = 'HIV total<br>prescription<br>expenditure', 
                               text = hiv_rxexp, 
                               textposition = 'auto',
                               size = ~ hiv_rxexp,
                               mode = "markers",
                               marker = list (sizemode = "diameter"),
                               hovertemplate = paste('%{x}, $%{text}'),
                               hoverlabel = list(bordercolor = 'transparent', 
                                                 font = list(size = 16, color = 'white')))

fig3.1 <- fig3.1 %>% add_trace(x = ~ year, 
                               y = ~ hiv_income, 
                               name = 'HIV person<br>total income', 
                               text = hiv_income, 
                               textposition = 'auto',
                               size = ~ hiv_income,
                               mode = "markers",
                               marker = list (sizemode = "diameter"),
                               hovertemplate = paste('%{x}, $%{text}'),
                               hoverlabel = list(bordercolor = 'transparent', 
                                               font = list(size = 16, color = 'white')))

fig3.1 <- fig3.1 %>% add_trace(x = ~ year, 
                             y = ~ hiv_wage, 
                             name = 'HIV person<br>wage', 
                             text = hiv_wage, 
                             textposition = 'auto',
                             size = ~ hiv_wage,
                             mode = "markers",
                             marker = list (sizemode = "diameter"),
                             hovertemplate = paste('%{x}, $%{text}'),
                             hoverlabel = list(bordercolor = 'transparent', 
                                               font = list(size = 16, color = 'white')))

fig3.1 <- fig3.1 %>% layout(title = "Medical Expenditures and Income among HIV Patients (2016-2020)", 
                            font = list(size = 14),
                            xaxis = list(title = "Years", 
                                         size = 12, 
                                         tickfont = list(size = 18)),
                            yaxis = list(title = "US Dollar", 
                                         size = 12, 
                                         tickfont = list(size = 12)),
                            margin = list(l = 10, r = 10, b = 10, t = 40))

fig3.1
    3.2 Health insurance status among HIV population
year <- c('2016', '2017', '2018', '2019', '2020')
hiv_private <- c(75413, 218633, 220333, 140474, 273282)
hiv_public <- c(243215, 246815, 178129, 251824, 222642)
hiv_uninsured <- c(28343, 25116, 17488, 0, 0)

data3.2 <- data.frame(year, hiv_private, hiv_public, hiv_uninsured)

fig3.2  <- plot_ly(data3.2, 
                   x = ~ year, 
                   y = ~ hiv_private, 
                   type = 'scatter', 
                   name = "Private ",
                   text = "Any private insurance",
                   textposition = 'auto',
                   size = ~hiv_private,
                   mode = 'markers',
                   marker = list(color='#FF82AB', 
                               line = list(color ="#FF82AB"), 
                               sizemode = "diameter", 
                               opacity = 0.6),
                   hovertemplate = paste('%{y:,.3~s} HIV patients with <b><em>private insurance</b></em> in %{x}'),
                   hoverlabel = list(bordercolor = 'black', 
                                   font = list(size = 18, color = 'black')))

fig3.2 <- fig3.2 %>% add_trace(x = ~ year, 
                               y = ~ hiv_public, 
                               type = 'scatter', 
                               name = "Public",
                               text ="Public insurance only",
                               textposition = 'auto',
                               size = ~hiv_public,
                               mode = 'marker',
                               marker = list(color='#2E8B57', 
                                         line = list(color ="#2E8B57"), 
                                         sizemode = "diameter", 
                                         opacity = 0.6),
                               hovertemplate = paste('%{y:,.3~s} HIV patients with <b><em>public insurance</b></em> in %{x}'),
                               hoverlabel = list(bordercolor = 'black', 
                                             font = list(size = 18, color = 'white')))

fig3.2 <- fig3.2 %>% add_trace(x = ~ year, 
                               y = ~ hiv_uninsured, 
                               type = 'scatter', 
                               name = "No insurance",
                               text = "Uninsured",
                               textposition = 'auto',
                               size = ~hiv_uninsured,
                               mode = 'marker',
                               marker = list(color='#EEB422', 
                                             line = list(color ="#EEB422"), 
                                             sizemode = "diameter", 
                                             opacity = 0.7),
                               hovertemplate = paste('%{y:,.3~s} HIV patients with <b><em>no insurance</b></em> in %{x}'),
                               hoverlabel = list(bordercolor = 'black', 
                                             font = list(size = 18, color = 'black')))

fig3.2 <- fig3.2 %>% layout(title = 'Health Insurance Coverage Status among HIV patients (2016-2020)',
                            font = list(size = 14),
                            yaxis = list(title = "Number of HIV patients", 
                                     size = 12, 
                                     tickfont = list(size = 12)),
                            xaxis = list(title = "Years", 
                                     size = 12, 
                                     tickfont = list(size = 18)),
                            margin = list(l = 30, r = 10, b = 10, t =40))

fig3.2

4. Line plot - total health care and prescription expenditures, total person income and person wage among HIV population

fig4  <- plot_ly(data3.1, 
                 x = ~ year, 
                 y = ~ hiv_totexp, 
                 name = 'HIV total<br>health care<br>expenditure', 
                 text = hiv_totexp, 
                 textposition = 'auto',
                 type = 'scatter', 
                 mode = 'lines+markers',
                 line = list (width = 2),
                 hovertemplate = paste('%{x}, $%{text}'),
                 hoverlabel = list(bordercolor = 'transparent', 
                                   font = list(size = 18, color = 'white')))
                   

fig4 <- fig4 %>% add_trace(x = ~ year, 
                           y = ~ hiv_rxexp, 
                           name = 'HIV total<br>prescription<br>expenditure', 
                           text = hiv_rxexp, 
                           textposition = 'auto',
                           type = 'scatter', 
                           mode = 'lines+markers',
                           line = list (width = 2),
                           hovertemplate = paste('%{x}, $%{text}'),
                           hoverlabel = list(bordercolor = 'transparent', 
                                             font = list(size = 18, color = 'white')))
                               

fig4 <- fig4 %>% add_trace(x = ~ year, 
                           y = ~ hiv_income, 
                           name = 'HIV person<br>total income', 
                           text = hiv_income, 
                           textposition = 'auto',
                           type = 'scatter', 
                           mode = 'lines+markers',
                           line = list (width = 2),
                           hovertemplate = paste('%{x}, $%{text}'),
                           hoverlabel = list(bordercolor = 'transparent', 
                                             font = list(size = 18, color = 'white')))
                               

fig4 <- fig4 %>% add_trace(x = ~ year, 
                           y = ~ hiv_wage, 
                           name = 'HIV person<br>wage', 
                           text = hiv_wage, 
                           textposition = 'auto',
                           type = 'scatter', 
                           mode = 'lines+markers',
                           line = list (width = 2),
                           hovertemplate = paste('%{x}, $%{text}'),
                           hoverlabel = list(bordercolor = 'transparent', 
                                             font = list(size = 18, color = 'white')))
                               

fig4 <- fig4 %>% layout(title = "Medical Expenditures and Income among HIV Patients (2016-2020)", 
                        font = list(size = 14),
                        xaxis = list(title = "Years", 
                                     size = 12, 
                                     tickfont = list(size = 18)),
                        yaxis = list(title = "US Dollar", 
                                     size = 12, 
                                     tickfont = list(size = 12)),
                        margin = list(l = 20, r = 5, b = 30, t = 40))

fig4

5. Stacked area chart with original values - genders among HIV population

year <- c('2016', '2017', '2018', '2019', '2020')
hiv_male <- c(254645,   350036, 282527, 344585, 392146)
hiv_female <- c(92325, 140527, 133422, 47712, 103778)

data5 <- data.frame(year, hiv_male, hiv_female)

fig5 <- plot_ly (data5,
                 x = ~ year,
                 y = ~hiv_male,
                 name = 'Male', 
                 type = 'scatter',
                 mode = 'none',
                 stackgroup = 'one',
                 fillcolor = '#87CEFF',
                 hovertemplate = paste('%{y:,.3~s} <b><em>male</b></em> patients in %{x}'),
                 hoverlabel = list(bordercolor = 'black', 
                                   font = list(size = 18, color = 'black')))

fig5 <- fig5 %>% add_trace (y = ~ hiv_female,
                            name = 'Female', 
                            fillcolor = '#FFC0CB',
                            hovertemplate = paste('%{y:,.3~s} <b><em>female</b></em> patients in %{x}'),
                            hoverlabel = list(bordercolor = 'black', 
                                              font = list(size = 18, color = 'black')))
                            
fig5 <- fig5 %>% layout(title = 'Gender among HIV patients (2016-2020)',
                        font = list(size = 16),
                        yaxis = list(title = "Number of HIV patients", 
                                     size = 12, 
                                     tickfont = list(size = 12), 
                                     tickformat = ".3~s",
                                     tickvals = c(100000,200000,300000,400000,500000)),
                        xaxis = list(title = "Years", 
                                     size = 12, 
                                     tickfont = list(size = 18)),
                        margin = list(l = 50, r = 50, b = 50, t = 50))

fig5

6. Stacked area chart with cumulative values - education categories among HIV population

year <- c('2016', '2017', '2018', '2019', '2020')
elementary<- c(6699, 17122, 31790, 19258, 13302)
high_school <- c(171920, 214518, 140991, 133889, 194797)
college1_3 <- c(106588, 122049, 123008, 120942, 112365)
college4_more <- c(61762,   129630, 120160, 118208, 175460)


data6 <- data.frame(year, elementary, high_school, college1_3, college4_more)

fig6 <-  plot_ly(data6, 
                 x = ~ year, 
                 y = ~ high_school, 
                 name = "High School",
                 text = high_school,
                 textposition = 'auto',
                 type = 'scatter',
                 mode = 'none',
                 stackgroup = 'one',
                 groupnorm = 'percent',
                 hovertemplate = paste('%{text:,.3~s} HIV patients, %{y}'),
                 fillcolor = '#FFF68F')

fig6 <- fig6 %>% add_trace(y = ~ college1_3, 
                           name = "1-3 years college",
                           text = college1_3, 
                           textposition = 'auto',
                           hovertemplate = paste('%{text:,.3~s} HIV patients, %{y}'), 
                           fillcolor = '#87CEFA')

fig6 <- fig6 %>% add_trace(y = ~ college4_more,
                           name = "4+ years college",
                           text = college4_more,
                           textposition = 'auto',
                           hovertemplate = paste('%{text:,.3~s} HIV patients, %{y}'),
                           fillcolor = '#FFB6C1')

fig6 <- fig6 %>% add_trace(y = ~ elementary,
                           name = "Elementary",
                           text = elementary,
                           textposition = 'auto',
                           hovertemplate = paste('%{text:,.3~s} HIV patients, %{y}'),
                           fillcolor = '#528B8B')

fig6 <- fig6 %>% layout(title = "Proportions of Education Levels among HIV patients (2016 - 2020)",
                        font = list (size = 14), 
                        xaxis = list(title = "Years", 
                                     size = 12, 
                                     tickfont = list(size = 18)),
                        yaxis = list(title = "Cummulative Percentage", 
                                     size = 12, 
                                     tickfont = list(size = 12), 
                                     ticksuffix = '%', 
                                     tickformat = ".1f",
                                     tickvals = c(20,40,60,80,100)),
                       hovermode = "x unified", ## a single hover label describing one point per trace, for points at the same x value as the cursor
                       margin = list(l = 5, r = 5, b = 10, t = 40))

fig6

Acknowledgements

I would like to express my deepest gratitude to my site mentor, Dr. Mark Bounthavong for his constant support throughout this practicum. I appreciate his making sure this project challenging but interesting and beneficial at the same time, and that I was able to grow professionally and academically. These efforts would not have been possible without all his help.